pybotfandomcom-20200214-history
Regex:Quotation elimination
is a bit of regex that will help you eliminate quotation marks. It is typically used with the replace.py script. Code python replace.py -regex "\"\'\'|\'\'\"" '"' -start:! Explanation One of the trickier things to do with regex is to replace quotation marks, since quotation marks are required by replace.py to define the text you wish to replace. The above script imagines situations where editors have followed Memory Alpha's odd practice of italicizing within a double quote. An example of the practice would be: "He's dead, Jim." The challenge is thus to eliminate the single quotes, but keep the double ones. You've thus got to escape all the quotation marks and retain only the ones you want to keep. Also, you don't want to make two runs — one for the left and one for the right. So you start with the left condition (") add a pipe (|), then add the right condition ("). The result is that it looks for both sides simultaneously. Note in the bit you're looking for ("\"\'\'|\'\'\"") you use double quotes to tell replace.py what terms you're looking for. But in the replacement ('"', you use single quotes. The result of this whole thing is that it finds the single quotes to the left of He's, and the single quotes to the right of Jim. Hence, the final output is: "He's dead, Jim." Known limitations This isn't something you should run automatically. There are legitimate cases which will be destroyed, and you could easily create "italicisation havoc". For instance, if the quote started or ended with a title of a work, you could easily make huge sections of your article italic. Imagine this: Leonard Nimoy once said, "The Search for Spock was quite a learning curve for me." This regex would create an italicisation nightmare, because it would result in: Leonard Nimoy once said, "The Search for Spock'' was quite a learning curve for me." Obviously, this would mean that The Search for Spock wouldn't be italicised at all. Instead, everything after the title would be. So make sure you only use this regex in manual mode. Variants The above code will return a lot of false positives, as the previous section proves. To get more precise modes, you might try something like this: python replace.py -regex "\'\'(\".*?\")\'\'" "\1" -start:"!" -summary:"getting rid italicisation of quotations, in the case where italics code precedes double quotation marks " This differentiates itself from the main code on this page in that it finds the pattern: "quotation" and rewrites it as "quotation" In other words, it works on both the left and right sides of quotations, but only removes them when it finds both a right and left side. The main code on the page, by contrast, strips both sides only because it's stripping the left and right sides individually. It's possible for the main code on the page to strip only a left side or a right side, if it happens to find a sentence that has only one side or the other. Category:regex that deletes stuff